Search CORE

814 research outputs found

Combining LiDAR Space Clustering and Convolutional Neural Networks for Pedestrian Detection

Author: Ekenel Hazım Kemal
Matti Damien
Thiran Jean-Philippe
Publication venue
Publication date: 17/10/2017
Field of study

Pedestrian detection is an important component for safety of autonomous vehicles, as well as for traffic and street surveillance. There are extensive benchmarks on this topic and it has been shown to be a challenging problem when applied on real use-case scenarios. In purely image-based pedestrian detection approaches, the state-of-the-art results have been achieved with convolutional neural networks (CNN) and surprisingly few detection frameworks have been built upon multi-cue approaches. In this work, we develop a new pedestrian detector for autonomous vehicles that exploits LiDAR data, in addition to visual information. In the proposed approach, LiDAR data is utilized to generate region proposals by processing the three dimensional point cloud that it provides. These candidate regions are then further processed by a state-of-the-art CNN classifier that we have fine-tuned for pedestrian detection. We have extensively evaluated the proposed detection process on the KITTI dataset. The experimental results show that the proposed LiDAR space clustering approach provides a very efficient way of generating region proposals leading to higher recall rates and fewer misses for pedestrian detection. This indicates that LiDAR data can provide auxiliary information for CNN-based approaches

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Optimality in multiple comparison procedures

Author: Meskaldji Djalel Eddine
Morgenthaler Stephan
Thiran Jean-Philippe
Publication venue
Publication date: 09/07/2013
Field of study

When many (m) null hypotheses are tested with a single dataset, the control of the number of false rejections is often the principal consideration. Two popular controlling rates are the probability of making at least one false discovery (FWER) and the expected fraction of false discoveries among all rejections (FDR). Scaled multiple comparison error rates form a new family that bridges the gap between these two extremes. For example, the Scaled Expected Value (SEV) limits the number of false positives relative to an arbitrary increasing function of the number of rejections, that is, E(FP/s(R)). We discuss the problem of how to choose in practice which procedure to use, with elements of an optimality theory, by considering the number of false rejections FP separately from the number of correct rejections TP. Using this framework we will show how to choose an element in the new family mentioned above.Comment: arXiv admin note: text overlap with arXiv:1112.451

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

a combined top-down and bottom-up approach

Author: Hainaut J.-L.
Thiran Philippe
Publication venue
Publication date: 01/01/2001
Field of study

The thesis focuses on the interoperability of autonomous legacy databases with the idea of meeting the actual requirements of an organization. The interoperability is resolved by combining the topdown and bottom-up strategies. The legacy objects are extracted from the existing databases through a database reverse engineering process. The business objects are defined by both the organization requirements and the integration of the legacy objects

Institutional Repository of the Freie Universität Berlin

Using Photorealistic Face Synthesis and Domain Adaptation to Improve Facial Expression Analysis

Author: Bozorgtabar Behzad
Ekenel Hazim Kemal
Rad Mohammad Saeed
Thiran Jean-Philippe
Publication venue
Publication date: 14/04/2019
Field of study

Cross-domain synthesizing realistic faces to learn deep models has attracted increasing attention for facial expression analysis as it helps to improve the performance of expression recognition accuracy despite having small number of real training images. However, learning from synthetic face images can be problematic due to the distribution discrepancy between low-quality synthetic images and real face images and may not achieve the desired performance when the learned model applies to real world scenarios. To this end, we propose a new attribute guided face image synthesis to perform a translation between multiple image domains using a single model. In addition, we adopt the proposed model to learn from synthetic faces by matching the feature distributions between different domains while preserving each domain's characteristics. We evaluate the effectiveness of the proposed approach on several face datasets on generating realistic face images. We demonstrate that the expression recognition performance can be enhanced by benefiting from our face synthesis model. Moreover, we also conduct experiments on a near-infrared dataset containing facial expression videos of drivers to assess the performance using in-the-wild data for driver emotion recognition.Comment: 8 pages, 8 figures, 5 tables, accepted by FG 2019. arXiv admin note: substantial text overlap with arXiv:1905.0028

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Integration of Legacy and Heterogeneous Databases

Author: Hainaut Jean-Luc
Thiran Philippe
Publication venue: Institut d'Informatrique - LIBD
Publication date: 01/01/2002
Field of study

Repository of the University of Namur

Combining Multiple Views for Visual Speech Recognition

Author: Ekenel Hazım Kemal
Ghazi Mostafa Mehdipour
Thiran Jean-Philippe
Zimmermann Marina
Publication venue
Publication date: 07/07/2017
Field of study

Visual speech recognition is a challenging research problem with a particular practical application of aiding audio speech recognition in noisy scenarios. Multiple camera setups can be beneficial for the visual speech recognition systems in terms of improved performance and robustness. In this paper, we explore this aspect and provide a comprehensive study on combining multiple views for visual speech recognition. The thorough analysis covers fusion of all possible view angle combinations both at feature level and decision level. The employed visual speech recognition system in this study extracts features through a PCA-based convolutional neural network, followed by an LSTM network. Finally, these features are processed in a tandem system, being fed into a GMM-HMM scheme. The decision fusion acts after this point by combining the Viterbi path log-likelihoods. The results show that the complementary information contained in recordings from different view angles improves the results significantly. For example, the sentence correctness on the test set is increased from 76% for the highest performing single view (

30^\circ

) to up to 83% when combining this view with the frontal and

60^\circ

view angles

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Preface

Author: Thiran Philippe
Risch Tore
Benslimane Djamal
Publication venue: Elsevier B.V.
Publication date: 01/01/2006
Field of study

19451945 (N19)

Elsevier - Publisher Connector

Crossref

Learn to synthesize and synthesize to learn

Author: Bozorgtabar Behzad
Ekenel Hazım Kemal
Rad Mohammad Saeed
Thiran Jean-Philippe
Publication venue
Publication date: 01/05/2019
Field of study

Attribute guided face image synthesis aims to manipulate attributes on a face image. Most existing methods for image-to-image translation can either perform a fixed translation between any two image domains using a single attribute or require training data with the attributes of interest for each subject. Therefore, these methods could only train one specific model for each pair of image domains, which limits their ability in dealing with more than two domains. Another disadvantage of these methods is that they often suffer from the common problem of mode collapse that degrades the quality of the generated images. To overcome these shortcomings, we propose attribute guided face image generation method using a single model, which is capable to synthesize multiple photo-realistic face images conditioned on the attributes of interest. In addition, we adopt the proposed model to increase the realism of the simulated face images while preserving the face characteristics. Compared to existing models, synthetic face images generated by our method present a good photorealistic quality on several face datasets. Finally, we demonstrate that generated facial images can be used for synthetic data augmentation, and improve the performance of the classifier used for facial expression recognition.Comment: Accepted to Computer Vision and Image Understanding (CVIU

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne